-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
statistics: ease the impact of stats feedback on cluster (#15503) #18770
statistics: ease the impact of stats feedback on cluster (#15503) #18770
Conversation
/run-all-tests |
@eurekaka please accept the invitation then you can push to the cherry-pick pull requests. |
Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
e52efc7
to
b037eaf
Compare
/run-unit-test |
1 similar comment
/run-unit-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/merge |
/run-all-tests |
cherry-pick #15503 to release-3.0
What problem does this PR solve?
Fix #17478
Problem Summary:
Statistics feedback would impose periodical read/write burden on the database. Each TiDB would dump the feedbacks collected on this instance into TiKV every 10 mins, and the stats owner TiDB instance would read the feedbacks dumped every 15 seconds. If the stats owner TiDB finds there are new feedbacks from TiKV, it would merge the feedbacks with statistics in cache, then dump all these updated statistics into TiKV. This dump operation is pretty heavy if there are bunches of feedbacks on bunches of columns/indexes, since it can be treated as a light-weight ANALYZE on a lot of tables.
What is changed and how it works?
What's Changed:
First, reduce the amount of feedbacks generated on each TiDB by:
MaxQueryFeedbackCount
;Second, merge multiple insert/update/delete statements of dumping statistics into single ones, to reduce the unnecessary function call stacks and RPCs.How it Works:
Obviously, the first change can flow-control the statistics feedback mechanism fundamentally, but we may lose some stats accuracy incurred by feedback theoretically.
The second change combines several small transactions into a big one, since we have controlled the amount of feedbacks using the first change, I guess this bigger transaction is supposed not to be a problem. However, the bigger transaction should have higher chances of write conflict, and it makes code harder to read, so I haven't made up my mind to keep it or not actually.Related changes
Check List
Tests
Side effects
Release note